\section{Related work}\label{sec:related}
%As mentioned before, it is desirable to extend conventional
%programming languages to reflects new hardwares. Researches in the field have two major directions:
To meet challenges of mulitcore, approaches based on existing programming languages can be broadly categorized into classes: A) library-based approaches, which
provide libraries to support paralllel programs. B) Compiler-based
approaches, which extend grammars or provide new language constructs for parallelism.

Library-based approaches extend the capabililty of programming languages
by well-developed libraries. Because both development and adaption of
the libraries are based on original languague infrastructures, the
burden to programmers is relatively little. Many systems provide
such libraries to expose interfaces for parallel programs such as Pthread(POSIX Thread), MPI(Message Passing Interface), and OpenCL~\cite{opencl}. However, system-level libraries usually have straightforward relationship to
their underlying platforms. Abstractions
of those libraries are far away from expressing parallelism
naturally, and it is difficult to
port programs from one library to another. Our library is a
metaprogramming library for source transformations. It contains template classes which abstract high-level parallel patterns
and executions, instead of the interfaces of system primitives or
resources. Besides system-level libraries, developing high-level
library to support parallelism and concurrency is also a hot topic in
programming language field~\cite{javacon, wincon}. C++ community intends to
achieve the target while bearing generic programming in mind.  TBB~\cite{tbb} has a plenty of
containers and building blocks to support loop-parallelism and
task-level parallelism.  Inspired by TBB's approach, we enable the
same effects in static domain. We aim at use static information
to tailor different multicore architectures. Besides, template-based
approach we proposed is orthogonal to runtime parallel libraries. We only exploit parallelism which can be resolved by compilers.

%\textit{e.g.} Pthread is a
%\textit{de facto} standard interface of multithread for
%POSIX-compatible systems. 
%Furthermore, the implementation of thread on hardware is
%undefined in the standard, so it can not guarantee performance or even
%correctness on some architectures \cite{Boehm05}.

%Entities including partitioner and
%scheduler in TBB are created at run time. In that case, key data
%structures have to be thread-safe. Although TBB exploits task
%parallelism or other sophisticated concurrency on general purpose
%processors, the runtime overhead is relative high in data parallel
%programs, especially in the scenario that many lightweight threads are
%executing by hardware. 


%% MPI
%Another dominant parallel library is MPI in supercomputing community. It based on message-passing mechanism and SPMD model to execute parallel program. The difficulties of developing MPI programs are as notorious as pthread counterpart for programmers without sufficent training of parallel programming. 
%%

%OpenMP ~\cite{openmp} is designed for shared memory and has been shipped in almost every C/Fortran
%compilers.
%OpenMP can only perform Fork-Join parallel model.
%Source-to-Source transformation for optimization was reported by~\cite{Loveman77}, whose granularity of transformation  is
%statement.  Most of works have been merge in forms of IR in
%modern compilers. New source-to-source transformation compilers focus
%on function. 
%The run-time is usually provides in the form of dynamic link library.
%Although it is simple and
%portable, the performance is not optimal in most cases. Moreover, a
%handful of directives in OpenMP leave little room for further improving
%performance or scaling up to larger systems. Hybrid OpenMP with MPI is
%possible though, difficulties surge. 

On the other side, compiler-based approaches attempt to revise
programming languages per se. They add annotations or language constructs to
support parallel programs. Consequently, programmers
can describe parallel algorithms directly while don't care about low-level executions.
OpenMP~\cite{openmp} compilers transform sequential code blocks into
multi-threaded equivalences according to directives. OpenMP
is \textit{de facto} standard for shared memory systems though, it does not
support heterogeneous mulitcores. In order to achieve portability for
parallel programs, sequoia~\cite{sequoia, sequoia-compiler} transforms a \textit{task} into a cluster of
\emph{variants}, and uses Parallel Memory Hierarchy(PMH)
model~\cite{pmh} to map variants on physical architectures.
We derive the same idea from sequoia by transforming and mapping tasks
at compile time. Merge~\cite{merge} is a dynamic map/reduce programming
framework for heterogeneous multicores. It relies on the hierarchical division of tasks and predicate-based
dispatch system to assign tasks on matched targets at runtime. An obvious shortcoming of compiler-based aproaches is that each
approach can only support one kind of parallel patterns. 

We intend to fuse the advantages of library-based approaches and
compiler-based approaches. Our approach performs source-to-source
transformations for different mulitcores by C++ template libraries. We
demonstrate that template metaprogramming is powerful enough to implement multiple
programming models.


%We think it is this process hardwires fixed parallel patterns into the
%compilers. Therefore, we explore the powerness of metaprogramming to
%transform sources for parallelism, which
%can support mulitple parallel programming models while maintain
%portability for mulitcores.

%First of all, it targets execution
%environment as a tree of machines, which an individual machine owns
%its storage and computation unit. Second, Target machine
%is described in XML files. \cite{sequoia} reports that Sequoia can transform programs
%for CellBE, cluster while keeping competitive performance. That
%is at expense of implementing one compiler for each platforms.
%The primary drawback of Sequoia is that its language constructs can not cover common
%parallel patterns such as pipeline or task queue. Methods mentioned before all need non-trivial efforts to
%modify compilers. As discussed in \cite{sequoia}, the authors of the Sequoia were
%still not clear whether the minimal set of primitives they provided provides can
%sufficiently express dynamic applications. We doubt if it is worthwhile to
%invest a compiler given the fact that template library can also
%achieve the same functionalities.

%end of section.